A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces
نویسندگان
چکیده
Finding the nearest neighbor among a large collection of high dimensional vectors can be a computationally demanding task. In this paper, we pursue fast vector matching by representing vectors in IRn with lower dimensional projections in IR, m ≤ n. The key to creating and using the representative vectors is a lower bound on the Euclidean distance between arbitrary vectors in IRn based on the submultiplicative property of induced matrix norms. For any non-zero projection matrix A ∈ IR, the bound is proportional to the distance between the projected vectors. We study other existing bounds involving orthogonal transforms and piecewise constant approximation maps in light of this formulation. Additionally, we address the question of how to optimize the projection matrix given a dataset in order to make the bound as tight as possible. Experimental results on a speech database show that exact nearest neighbor computation can be accelerated by a factor of 5 using the proposed bound.
منابع مشابه
Classification, with Applications to Object and Shape Recognition in Image Databases
Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time,...
متن کاملSigni cance-Sensitive Nearest-Neighbor Search for E cient Similarity Retrieval of Multimedia Information
Nearest-neighbor search (NN-search) in the feature space is widely used for the similarity retrieval of multimedia information. Each piece of multimedia information is mapped to a vector in a multi-dimensional space where the distance between two vectors (typically, Euclidean distance between the heads of vectors) corresponds to the similarity of multimedia information. Once the feature space i...
متن کاملMetric-Based Shape Retrieval in Large Databases
This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...
متن کاملPAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces
In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures – the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than (1 + ) times the distance between ...
متن کاملNearest Neighbor Searching in Image Databases
iii Abstract A frequently encountered type of query in image database systems is to nd the k most similar images to a query image with respect to its feature. Processing such queries requires substantially diierent search algorithms than those for the normal k nearest neighbor problem: dimensionality of the feature may be very high and similarity measure may not be as simple as a Euclidean dist...
متن کامل